计算领域经历了一场根本性转变,从 延迟优化 CPU 设计转向 吞吐量优先 GPU 架构。虽然 CPU 像一辆高速快递摩托车(单个包裹处理速度快),但 GPU 则像一艘巨型货轮:每个物品移动较慢,却能一次运输 5 万个集装箱。
1. 延迟与吞吐量
CPU 的设计旨在通过复杂的分支预测技术,最小化单个指令序列的「完成时间」。相反, 图形处理器(GPU) 则被设计为通过并行执行数千个线程来最大化「每秒工作量」,以牺牲单线程速度换取巨大的总体吞吐能力。
2. 晶体管分配
在相似的价格和功耗范围内,GPU 提供的指令吞吐量和内存带宽远高于 CPU。GPU 专为高度并行计算而设计,将更多晶体管用于 数据处理单元(算术逻辑单元),而 CPU 则将更多晶体管用于数据缓存和流程控制。
3. CUDA 的演进
统一计算设备架构(CUDA) 由英伟达于 2006 年推出。它是一种并行计算平台和编程模型,能够通过独立于图形 API 的方式充分利用 GPU 的强大性能,实现性能的显著提升。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Which component consumes the majority of silicon real estate in a traditional CPU?
Arithmetic Logic Units (ALUs)
Control logic and Data Caching
Floating Point Units
Memory Controllers
✅ Correct!
Correct! CPUs prioritize latency reduction, requiring large caches and complex control logic.❌ Incorrect
ALUs dominate GPU die area, but in CPUs, Control and Cache are the primary consumers.QUESTION 2
What was the original purpose of the GPU before CUDA?
General purpose scientific computing
Operating system kernel management
Fixed-function hardware for 3D rendering
High-frequency trading
✅ Correct!
Yes, GPUs started as fixed-function hardware specifically for accelerating real-time 3D graphics.❌ Incorrect
Before CUDA (2006), GPUs were restricted to graphics APIs like OpenGL or DirectX.QUESTION 3
In the cargo ship analogy, what represents the 'Throughput'?
The speed at which the ship moves across the ocean.
The total volume of containers delivered at once.
The size of the ship's engine.
The fuel efficiency per container.
✅ Correct!
Throughput is the aggregate 'work-per-second', like the massive number of containers moved per voyage.❌ Incorrect
Speed represents latency; throughput represents the total volume of work completed.QUESTION 4
What is the primary trade-off made by GPUs to achieve high aggregate throughput?
Higher power consumption per unit.
Lower single-thread performance.
Reduced memory bandwidth.
Simplified mathematical precision.
✅ Correct!
GPUs trade off individual thread speed (latency) to pack thousands of threads together for total throughput.❌ Incorrect
Actually, GPUs usually offer much higher memory bandwidth than CPUs.QUESTION 5
Which NVIDIA software component is required to run CUDA applications?
DirectX 12
NVIDIA Driver and CUDA Toolkit
OpenGL Wrapper
Windows GDI+
✅ Correct!
Correct. The CUDA Toolkit and the NVIDIA Driver form the bridge between your code and the hardware.❌ Incorrect
CUDA enables workloads independent of graphics APIs like DirectX or OpenGL.Architectural Analysis: CPU vs. GPU Selection
Determine the optimal processor for a given workload scenario.
A financial firm needs to process two different tasks: 1) A complex decision-making algorithm with hundreds of nested 'if-else' statements that must finish as fast as possible for a single user. 2) A Monte Carlo simulation that runs the same simple formula 10 million times with different random inputs.
Q
Which processor (CPU or GPU) should be used for Task 1 and why?
Solution:
Task 1 should use a CPU. Because it involves complex flow control and requires low latency for a single sequence, the CPU's sophisticated branch prediction and large caches are better suited than the GPU's throughput-oriented ALUs.
Task 1 should use a CPU. Because it involves complex flow control and requires low latency for a single sequence, the CPU's sophisticated branch prediction and large caches are better suited than the GPU's throughput-oriented ALUs.
Q
Which processor should be used for Task 2 and why?
Solution:
Task 2 should use a GPU. It is an 'embarrassingly parallel' task where the same operation is repeated millions of times. The GPU can devote its massive array of ALUs to process thousands of these simulations simultaneously, achieving much higher total throughput.
Task 2 should use a GPU. It is an 'embarrassingly parallel' task where the same operation is repeated millions of times. The GPU can devote its massive array of ALUs to process thousands of these simulations simultaneously, achieving much higher total throughput.